BUG: fix block fragmentation in DataFrame.astype with dict dtype#63461
Closed
Chiwendaiyue wants to merge 8 commits intopandas-dev:mainfrom
Closed
BUG: fix block fragmentation in DataFrame.astype with dict dtype#63461Chiwendaiyue wants to merge 8 commits intopandas-dev:mainfrom
Chiwendaiyue wants to merge 8 commits intopandas-dev:mainfrom
Conversation
…o FIX_astype_block
Contributor
Author
|
I've implemented a fix that consolidates blocks only when block count explodes (currently, when blocks == columns). I'm unsure if this threshold is optimal. It feels somewhat subjective. Could any maintainer provide guidance on a better criterion please? Thanks! |
rhshadrach
requested changes
Dec 24, 2025
Comment on lines
+6530
to
+6534
| warnings.warn( | ||
| f"astype block consolidation failed: {type(e).__name__}", | ||
| UserWarning, | ||
| stacklevel=2, | ||
| ) |
| total_cols = len(self.columns) | ||
| # only when the number of blocks explode do this | ||
| if current_blocks == total_cols and total_cols > 5: | ||
| mgr._consolidate_inplace() |
Member
There was a problem hiding this comment.
This is still creating a very fragmented DataFrame and then performing a copy. We would prefer not fragmenting the DataFrame at all in the first place (I think this should be possible).
Member
|
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes the issue by actively consolidating blocks of the same dtype after a dictionary-based astype operation. The fix is minimal (the alternative code change I thought is to determine and perform the correct partitioning behavior during the initial transformation. )and I think it's safe.
It adds a call to
_consolidate_inplace()on the result's BlockManager when dtype is a dict.The consolidation is wrapped in a try-except block with a warning to ensure it never breaks the core functionality of astype. Failures are silent and backward compatible.
I tried the Reproducible Example and it worked well. If there is any problem, I'm happy to fix it.